Bayesian SEM and Big Data: An Unhappy Marriage?

Sara van Erp

Utrecht University

Bayesian SEM and big data: Three problems


No automatic sparsity


No flexible user-friendly software

Computationally expensive

Setting the scene

Focus on two situations:

  1. “Large” data sets & models with many variables, e.g., mediation or MIMIC models
  2. “Small” data sets & models with many parameters compared to the sample size, e.g., CFA with cross-loadings


In both cases, \(p/n\) is high, which causes problems in terms of overfitting and uninterpretable models.

Solution to overfitting: Regularization

In models with many parameters, regularization is increasingly used to prevent overfitting.

Regularized SEM adds a penalty to the parameters to regularize, e.g., ridge, lasso, or elastic net

\[ F_{regsem}(S, \Sigma(\theta)) = F(S, \Sigma(\theta)) + \lambda P(\theta_{reg}) \] See for example Jacobucci, Grimm, and McArdle (2016) or Huang (2020).

Bayesian regularized SEM

A shrinkage prior takes the role of the penalty:

\[ posterior \propto likelihood \times prior \]

Ideal shrinkage prior:

  1. Peaked around zero
  2. Heavy tails

Many different shrinkage priors exist (see e.g., Van Erp, Oberski, and Mulder (2019)) and several have been applied in SEM (see Van Erp (2023) for an overview).

Advantages Bayesian regularized SEM

  • Intuitive interpretation
  • Automatic uncertainty estimates
  • Incorporation of prior information
  • Flexibility in the choice of shrinkage prior
  • Automatic estimation penalty parameter

Problem 1: No automatic sparsity

Problem 1: No automatic sparsity

In Bayesian regularized SEM, parameters are not automatically set to zero.

  • Van Erp, Oberski, and Mulder (2019) showed different conditions require different CIs in linear regression models
  • Zhang, Pan, and Ip (2021) showed different conditions require different (arbitrary) types of criteria in SEM
  • As the dimensionality grows, parameters get pulled more heavily to zero and marginal criteria can fail

Example: MIMIC model

MIMIC model drawn with https://semdiag.psychstat.org

Potential solution 1: Projection predictive variable selection for SEM

Joint work (in progress) with Paul Bürkner and Aki Vehtari.

Goal: Finding a smaller submodel that predicts practically as good as the larger reference model.

  1. Specify a reference model
  2. Project the posterior information of the reference model onto the candidate models
  3. Select the candidate model with the best predictive performance

See e.g., Piironen and Vehtari (2017), Pavone et al. (2020), Piironen, Paasiniemi, and Vehtari (2020), or McLatchie et al. (2023)

Projection predictive variable selection: Technical details (1)

Idea behind posterior projection: replace the reference model posterior \(p(\theta_*|D)\) with a simpler, restricted model \(q_\perp (\theta)\).


Implementation: minimize the KL divergence between the induced predictive distributions:

\[ \theta_\perp = \text{arg min KL }(p(\tilde{y }|\theta_*) || p(\tilde{y }|\theta)) \]

Projection predictive variable selection: Technical details (2)

Practical implementation: instead of numerical optimization, for Gaussian linear models an analytical solution is available (Piironen, Paasiniemi, and Vehtari (2020)):

\[ \beta_{proj} = (X^T X)^{-1} X^T \mu_* \] where \(\mu_*\) denotes the predictions based on the reference model and \(X\) the design matrix of the restricted model.

Projection predictive variable selection: Technical details (3)

  1. Forward search to find candidate models
  2. Evaluate predictive performance based on projected (clustered) posterior draws
  3. Select the model with predictive performance closest to the reference model.


Available in the R-package projpred (Piironen et al. (2023))

Projection predictive variable selection for the MIMIC model

library(lavaan)
library(brms)
library(projpred)

mod <- 'F =~ y1 + y2 + y3 + y4 + y5
        F ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10'

fit.lavaan <- sem(mod, data = df)
fs <- lavPredict(fit.lavaan, method = "Bartlett")
df$fs <- as.vector(fs)

refm_fit <- brm(fs ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
                data = df,
                prior = prior_hs)
refm_obj <- get_refmodel(refm_fit)

cvvs <- cv_varsel(
  refm_obj,
  cv_method = "kfold",
  K = 10
)
plot(cvvs) 

Example results: 10 covariates, with only x5 and x7 being relevant

Sensitivity and false discovery rates (excluding small effects)

Predictive performance

Potential solution 1: Projection predictive variable selection for SEM

  • Important caveat: these results are based on < 100 replications, without cross-validating the search part.
  • The suggest_size heuristic in projpred can be influential.
  • projpred seems to perform well especially in very sparse settings.
  • In low-dimensional settings, marginal threshold criteria might suffice.
  • Further simulation studies into high-dimensional settings are ongoing.
  • All simulation results so far are available at the OSF.

Problem 2: No flexible user-friendly software

Software overview (Bayesian) regularized SEM (Van Erp 2023)

Potential solution 2: Approximate Bayesian regularization

See Karimova et al. (preprint) on the OSF

Idea: Approximate the likelihood with a multivariate normal centered around the MLE.


Step 1: Fit the model and extract the unregularized estimates and error covariance matrix
Step 2: Fit an approximate regularized Bayesian model with a shrinkage prior:  \(p(\theta, \lambda|D) \propto N(\theta|\hat{\theta}_{MLE}, \hat{\Sigma}_{\theta}) p(\theta|\lambda) p(\lambda)\)

 

Implemented in shrinkem.

shrinkem illustration 1 - Restricted factor analysis

Restricted factor model for the Holzinger & Swineford (1939) data visualized using semPlot. See also Liang and Jacobucci (2019).

shrinkem illustration 1 - Restricted factor analysis

library(shrinkem)

# extract MLEs and covariance matrix for regression parameters corresponding to age
mle <- coef(fit_lav)
mleSel <- mle[grep("~age", names(mle))]
covmat <- lavInspect(fit_lav, what = "vcov")
id <- grep("~age", colnames(covmat))
covmatSel <- covmat[id, id]

# shrinkem with horseshoe prior
shrink_hs <- shrinkem(mleSel, covmatSel, type="horseshoe")
summary(shrink_hs)

shrinkem illustration 1 - Restricted factor analysis

[1] "Start burnin ... "
================================================================================
[1] "Start posterior sampling ... "
================================================================================
Call:
shrinkem.default(x = mleSel, Sigma = covmatSel, type = "horseshoe")

                    est shrunk.mean shrunk.median shrunk.mode shrunk.lower
t01_visperc~age  -0.060      -0.035        -0.035      -0.003       -0.135
t02_cubes~age     0.011       0.006         0.006       0.001       -0.069
t03_frmbord~age   0.066       0.039         0.039       0.003       -0.035
t04_lozenges~age  0.079       0.052         0.052       0.005       -0.026
t05_geninfo~age  -0.191      -0.076        -0.076      -0.007       -0.219
t06_paracomp~age -0.214      -0.100        -0.100      -0.085       -0.241
t07_sentcomp~age -0.259      -0.148        -0.148      -0.128       -0.283
t08_wordclas~age -0.271      -0.168        -0.168      -0.162       -0.301
t09_wordmean~age -0.165      -0.052        -0.052      -0.003       -0.188
t10_addition~age  0.159       0.115         0.115       0.117        0.000
t11_code~age      0.034       0.009         0.009       0.000       -0.062
t12_countdot~age  0.242       0.211         0.211       0.214        0.102
t13_sccaps~age    0.092       0.048         0.048       0.004       -0.026
t14_wordrecg~age -0.030      -0.016        -0.016      -0.001       -0.110
t15_numbrecg~age  0.069       0.035         0.035       0.002       -0.037
t16_figrrecg~age -0.096      -0.066        -0.066      -0.008       -0.182
t17_objnumb~age   0.136       0.099         0.099       0.088       -0.005
t18_numbfig~age   0.075       0.041         0.041       0.003       -0.034
t19_figword~age  -0.051      -0.026        -0.026      -0.001       -0.129
                 shrunk.upper nonzero
t01_visperc~age         0.033       0
t02_cubes~age           0.090       0
t03_frmbord~age         0.146       0
t04_lozenges~age        0.163       0
t05_geninfo~age         0.017       0
t06_paracomp~age        0.006       0
t07_sentcomp~age       -0.023       1
t08_wordclas~age       -0.041       1
t09_wordmean~age        0.033       0
t10_addition~age        0.239       0
t11_code~age            0.094       0
t12_countdot~age        0.321       1
t13_sccaps~age          0.161       0
t14_wordrecg~age        0.056       0
t15_numbrecg~age        0.140       0
t16_figrrecg~age        0.019       0
t17_objnumb~age         0.223       0
t18_numbfig~age         0.156       0
t19_figword~age         0.045       0
Package Time needed
blavaan 21.5s
shrinkem 10.3s

shrinkem illustration 2 - Mediation analysis

Mediation model drawn with https://semdiag.psychstat.org



Suppose we wish to regularize the indirect paths \(ab\) directly.

shrinkem illustration 2 - Mediation analysis

library(shrinkem)
library(blavaan)

# model specification
M <- paste0("M", 1:nM, " ~ ", "X \n")
Y <- c("Y ~ ", paste0("M", 1:nM, sep = " + "), "X \n")
lavmod <- c(M, Y)

# fit unregularized model with blavaan
defaultPriors <- dpriors(beta = "normal(0, 10000)") 
fit_blav <- bsem(lavmod, 
                 data = lav_train, 
                 std.lv = TRUE, 
                 dp = defaultPriors)

draws <- as.matrix(blavInspect(fit_blav, what = "mcmc"))

# compute indirect effects
draws_ab <- matrix(NA, nrow = nrow(draws), ncol = nM)
for(i in 1:nM){
  draws_a <- draws[, grep(paste0("M", i, "~X"), colnames(draws), fixed = TRUE)]
  draws_b <- draws[, grep(paste0("Y~M", i, "$"), colnames(draws))]
  draws_ab[, i] <- draws_a * draws_b
}

mle <- colMeans(draws_ab)
covmat <- cov(draws_ab)

# shrinkem with horseshoe prior
shrink_hs <- shrinkem(mle, covmat, type="horseshoe", iterations = 4000)
summary(shrink_hs)

shrinkem illustration 2 - Mediation analysis

Potential solution 2: Approximate Bayesian regularization

Two advantages:
- Faster than “exact” MCMC
- More flexible than classical regularization in terms of models (e.g., REMs, SEMs, GGNs) and priors (ridge, (group) lasso, horseshoe)

 

Two disadvantages:
- Relies on the accuracy of the normal approximation
- Dependent on availability MLEs

Problem 3: Computationally expensive

Problem 3: Computationally expensive


shrinkem offers a partial solution, but what if MLEs are not available?


Other approximate Bayesian algorithms might offer a solution.


See tomorrow’s talk (10:20h) on “Posterior Estimators in Highly Dimensional Bayesian Penalised Regression”.

Three problems and potential solutions


No automatic sparsity



ProjpredSEM

No flexible user-friendly software


shrinkem

Computationally expensive


shrinkem and other approximate algorithms

Bayesian SEM and big data: An unhappy marriage?

Questions/ideas?


Feel free to reach out during this conference, or via e-mail: s.j.vanerp@uu.nl.


These slides are available online at: https://github.com/sara-vanerp/presentations

References

Huang, Po-Hsien. 2020. Lslx : Semi-Confirmatory Structural Equation Modeling via Penalized Likelihood.” Journal of Statistical Software 93 (7). https://doi.org/10.18637/jss.v093.i07.
Jacobucci, Ross, Kevin J. Grimm, and John J. McArdle. 2016. “Regularized Structural Equation Modeling.” Structural Equation Modeling: A Multidisciplinary Journal 23 (4): 555–66. https://doi.org/10.1080/10705511.2016.1154793.
Karimova, Diana, Sara van Erp, Roger Leenders, and Joris Mulder. preprint. “Honey, i Shrunk the Irrelevant Effects! Simple and Fast Approximate Bayesian Regularization.” Simple and Fast Approximate Bayesian Regularization, preprint.
Liang, Xinya, and Ross Jacobucci. 2019. “Regularized Structural Equation Modeling to Detect Measurement Bias: Evaluation of Lasso, Adaptive Lasso, and Elastic Net.” Structural Equation Modeling: A Multidisciplinary Journal 27 (5): 722–34. https://doi.org/10.1080/10705511.2019.1693273.
McLatchie, Yann, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari. 2023. “Robust and Efficient Projection Predictive Inference.” arXiv. http://arxiv.org/abs/2306.15581.
Pavone, Federico, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. 2020. “Using Reference Models in Variable Selection.” arXiv. http://arxiv.org/abs/2004.13118.
Piironen, Juho, Markus Paasiniemi, Alejandro Catalina, Frank Weber, and Aki Vehtari. 2023. projpred: Projection Predictive Feature Selection.” https://mc-stan.org/projpred/.
Piironen, Juho, Markus Paasiniemi, and Aki Vehtari. 2020. “Projective Inference in High-Dimensional Problems: Prediction and Feature Selection.” Electronic Journal of Statistics 14 (1). https://doi.org/10.1214/20-EJS1711.
Piironen, Juho, and Aki Vehtari. 2017. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35. https://doi.org/10.1007/s11222-016-9649-y.
Van Erp, Sara. 2023. “Bayesian Regularized SEM: Current Capabilities and Constraints.” Psych 5 (3): 814–35. https://doi.org/10.3390/psych5030054.
Van Erp, Sara, Daniel L. Oberski, and Joris Mulder. 2019. “Shrinkage Priors for Bayesian Penalized Regression.” Journal of Mathematical Psychology 89 (April): 31–50. https://doi.org/10.1016/j.jmp.2018.12.004.
Zhang, Lijin, Junhao Pan, and Edward Haksing Ip. 2021. “Criteria for Parameter Identification in Bayesian Lasso Methods for Covariance Analysis: Comparing Rules for Thresholding, p -Value, and Credible Interval.” Structural Equation Modeling: A Multidisciplinary Journal 28 (6): 941–50. https://doi.org/10.1080/10705511.2021.1945456.